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Th«  Effect  of  Information  Display  Format  on  Multiple-Cue  Judgment 
3 Kant*  P.  Kerkar  and  William  C.  Howell 
Rice  University 


ABSTRACT 

The  rapid  evolution  of  computer  technology  has  drawn  considerable 
attention  to  the  wanner  in  which  information  is  presented  on  CRT's.  Much  of 
the  "engineering  psychology"  research  to  date  has  involved  evaluation  of 
display  parameters  (formatting,  color,  etc.)  in  terns  of  performance  measures 
(e.g.,  speed,  accuracy,  information  rates)  that  presume  a  clear-cut  criterion. 
Increasingly,  however,  users  are  facing  "real  world"  problems  ip^whlch  no 


unequivocal  criterion  exists— -situations,  for  ex 


r^in  which  1m 


are  required  on  the  basis  of  displayed  data.  Since  empirical  evidence  on  the 


effects  of  display  features  in 


ijMi se~ 


"■Cognitive^  tasks  is  sparse,  three  studies 


were  conducted  to  explore  various  aspects  of  the  relationship.  In  all  three, 
subjects  were  required  to  combine  multiple  predictive  items  (teacher 
attributes,  applicant  test  scores)  into  overall  evaluations  (teacher 
effectiveness;  qualification  for  a  defined  position)  under  conditions  of  either 
graphic  or  numerical  display.  Using  the  "policy  capturing^, 
methodology,  in  which  xmiltlple  regression  is  used  to  model  behavior,  a 
description  of  individual  judgment  strategies  was  obtained.  Display  format  was 
found  to  have  a  direct  influence  on  the  Importance  attached  to  (the  "weighting 
of")  the  separate  pieces  of  information  (viz..  Intelligence  etc.)  in  forming  an 
overall  evaluation.  Moreover,  simultaneous  presentation  of  graphic  Information 
tended  to  produce  holistic  processing  in  contrast  with  the  serial  processing  of 


numerical  information.  These  findings  appear  to  have  Important  implications 


for  the  design  of  computer-based  information  processing  systems 
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INTRODUCTION 

Considerable  attention  has  been  aeoorded  the  coding  and  formatting  of 
displayed  information,  particularly  in  the  context  of  human/eomputer  interface 
design.  Typically,  however,  the  assessment  has  involved  some  clearly  defined 
performance  oriterion  viz.,  speed  or  aecuraoy.  Thus  formats  that  result  in 
fewer  errors  or  quioker  responses  are  deemed  superior  to  others  (see,  e.g.. 
Baker  &  Goldstein,  1966;  Cincchinelli  &  Lantz,  1978;  Coffey,  1961;  Grace,  1966; 
Hannond,  1971;  Hitt,  Schutz,  Christner,  Ray,  &  Coffey,  1961;  Klemner  &  Prick, 
1953;  Tullis,  1981;  Wright,  1968).  However,  there  are  many  applied  contexts  in 
which  performance  is  not  so  easily  indexed,  notably  those  involving  Judgment 
and/or  decision.  This  is  especially  true  of  decisions  made  under  uncertainty 
since  decision  outcomes  can  rarely  be  used  to  gauge  the  quality  of  the  decision 
making  process  (Einhorn,  1980;  Einhorn  &  Hogarth,  1981;  Lichtenstein,  Fishhoff, 
*  Phillips,  1977).  In  such  situations  it  becomes  more  meaningful  to  focus  on 
the  way  decisions  are  made  (the  process  itself)  rather  than  what  they  are 
(the  decision  product).  Put  another  way,  other  aspeots  of  decision  performance 
(e.g.,  reliability)  become  more  salient  than  acouracy  per  se.  But, 
unfortunately,  relatively  little  is  known  about  the  effect  of  format  on  the 
decision  process.  Before  examining  the  rather  sparse  evidence  on  this  topic, 
it  might  be  well  to  review  the  most  commonly  used  paradigm  for  studying 
judgment  and  decision  behavior  in  the  absence  of  a  clear  external  criterion: 
the  policy  capturing  paradigm. 

The  basic  approach  in  policy  capturing  involves  the  application  of 
regression  analysis  to  aotual  judgments  in  an  effort  to  infer  how  the 
individual  weights  and  oombines  items  of  predictive  information  in  forming 
those  judgments.1  Suppose,  for  example,  the  judgment  of  Interest  was  a 
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n finder* a  evaluation  of  enemy  threat  based  on  surveillance  reports,  aonitored 
oossamioations ,  political  analyses,  and  other  intelligence  information.  To 
deteraine  how  auoh  weight  he  attaches  to  each  predictive  item  ("cue”),  we  night 
ask  hla  to  make  a  series  of  threat  assessments  for  hypothetical  intelligence 
reports  comprised  of  combinations  of  the  relevant  cue  values.  By  regressing 
the  obtained  threat  judgments  on  the  cue  values  in  a  multiple  regression 
analysis,  we  would  describe  his  weighting  "policy".  That  is,  the  resulting 
regression  weights  would  reflect  how  much  importance  he  tended  to  attach  to  the 
individual  cues;  the  heavier  the  weight,  the  greater  the  importance  of  that 
particular  oue  in  determining  hla  judgaenta.  A  summary  of  measures  used  to 


describe  various  aspects  of  the  so  captured  "policy"  is  given  in  Table  1.  Of 
course,  the  policy  capturing  approach  oan  also  be  applied  to  naturally 
occurring  judgments,  although  in  doing  so  one  must  contend  with  a  number  of 
analytic  problems  (see,  e.g.,  Dawes,  1971 ;  also  of.  Ebbesen  A  Konacnl,  1980 ; 


Phelps  A  Shanteau,  1978). 


Table  1  about  here 


The  policy  capturing  paradigm,  then,  provides  a  convenient  way  of  ✓ — s 

describing  different  aspeots  of  Judgment/decision  performance  in  the  absence  of  I  I; 

V  u  j 

any  accuracy  measure.  However,  despite  the  increasing  presence  of  computer  ^ — 
elements  in  decision  systems,  relatively  little  is  known  about  the  effects  of  _ 


display  format  on  individual  judgment  policies.  An  early  study  by  Knox  and 
Hoffman  (1962)  examined  the  effect  of  profile  format  on  judgments  of 


intelligence  and  sociability.  The  oue  values  were  displayed  graphically  either  - 

as  T-scorea  (with  a  mean  of  50  and  standard  deviation  of  10)  or  as  percentile  ~ 

jdes 

-  u:iu/or 

jDist  i  Special 


►  '  -  * 
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scores.  It  was  found  that  subjects  responded  "...  not  only  to  the  underlying 
■eaning  of  the  scores ,  but  to  the  position  of  the  points  on  the  profile  in  some 
absolute  sense"  (p.19).  Extreme  cue  values  produced  more  variable  judgments 
when  expressed  as  percentile  scores  than  as  T-seores  (which  tended  to  appear 
"squeezed  in").  Percentile  scores  resulted  in  more  reliable  judgments  and 
higher  values  of  However,  the  regression  weights  did  not  differ  as  a 

function  of  format. 

In  another  format  comparison,  Anderson  (1977)  found  that  judgments  of 
teacher  quality  showed  lower  linear  consistency  for  verbal  than  for  numerical 
cue  profiles,  but  that  the  pattern  of  weights  for  the  two  formats  ms  similar. 

One  methodological  point  related  to  both  of  these  studies  concerns  the  use 
of  standardized  regression  weights  to  compare  cue  weighting  under  different 
formats.  By  definition,  standardized  weights  measure  the  amount  of  change  in 
the  criterion  (T)  in  standard  deviation  (SD)  units  as  a  function  of  one  SD  unit 
change  in  the  predictor  (Xi).  Although  independent  of  scale,  expressing  change 
in  the  criterion  in  terns  of  SD  units  (as  standardized  weights  do)  may  not 
always  reflect  actual  changes  in  aue  weighting.  Suppose,  for  example,  that  a 
raw  score  regression  weight  (bi)  increases  as  a  function  of  some  experimental 
manipulation.  Such  an  inorease  would  produce  a  direct  Increase  in  the  SD  of 
the  criterion  (SDy)  as  well.  But  if  we  then  express  that  weight  in 
standardized  form  (Bi),  the  increase  in  the  raw  score  weight  is  offset  by  the 
increase  in  SDy,  and  the  real  change  in  cue  weighting  may  not  be  apparent  (see 
Equation  1). 

Bi  *  bi  (SDx  /  SDy)  Equation  1 

Lane,  ftirphy,  and  Marques  (1982)  make  a  convincing  argument  that  the  raw 
score  regression  weight  adjusted  by  the  SD  of  the  cues  (SDx)  may  be  a  more 
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appropriate  alternative  when  one  wiahes  to  express  cue  weight  independent  of 
the  soale  of  cue  aeasureaent  but  without  the  problem  of  a  concurrent  SDy 
effect. 

Although  Anderson  (1977)  did  not  report  the  SD  of  Judgments  under  the  two 
formats,  Knox  and  Hoffman  (1962)  found  judgments  to  be  more  variable  in 
response  to  percentile  scores  than  to  T-scorea.  It  is  therefore  not 
laMdiately  obvious  whether  the  failure  to  find  differences  in  standardised 
weights  in  these  studies  indeed  implies  that  there  were  no  differenoes  in  the 
perceived  importance  of  cues,  or  whether  it  was  an  artifact  of  the  above 
standardization  problem. 

Studies  of  policy  capturing  thus  have  offered  inconclusive  evidence  on 
whether  display  format  af foots  subjects'  cue  weighting.  Moreover,  the 
comparison  formats  used  were  either  not  readily  Interpretable  (e.g.,  Knox  & 
Hoffman,  1962)  or  Involved  verbal  and  numerical  displays  (e.g.,  Anderson, 

1977).  The  primary  goal  of  the  present  research,  then,  was  to  determine  the 
effeot  of  relatively  ctxmon  display  formats  on  cue  weighting  and  other  aspects 
of  the  subject's  policy  when  external  optimization  criteria  were  not  available. 
The  present  studies  simply  compared  the  effeot  of  two  display  formats, 
numerical  and  graphio  representation  of  eues,  on  subjects'  overall  judgments 
and  decisions  based  on  those  cues.  These  formats  were  chosen  because  they 
represent  two  broad  olasses  of  structured  displays;  moreover,  subjects' 
familiarity  with  them  makes  them  easily  understood.  It  might  be  noted  that 
additional  task  variables  were  manipulated  in  conjunction  with  format  in  each 
spool fio  experiment,  and  that  the  task  scenario  was  varied  between  experiments. 
Scenarios  that  had  "face  validity"  and  that  possessed  features  relevant  to  the 
requirements  of  the  experimental  paradigm  under  consideration  were  chosen. 


fVirtlMr  dlsousaion  of  these  methodological  features  is  reserved  for  the 
detailed  account  of  each  experiment. 

EXPERIMENT  1 

Given  that  information  regarding  display  format  effects  in  a  policy 
capturing  paradigm  is  sparse,  this  experiment  was  simply  an  attempt  to 
determine  whether  a  gross  format  difference  (numerical  vs.  graphic  display) 
would  affect  either  judgments  or  choices  based  upon  identical  input  data. 
Subjects  were  required  to  process  multidimensional  stimuli  that  were  displayed 
numerically  and  graphically,  and  their  subsequent  responses  under  both  formats 
were  compared. 

Subjects  performed  two  types  of  decision  tasks — judgment  and  choice.  A 
number  of  investigators  have  suggested  that  the  type  of  response  required — 
judgment  and  choice— influences  how  people  process  information  (e.g.,  Einhorn  A 
Hogarth,  1981 ;  Hammond,  McClelland,  A  Manpower,  1980}  Payne,  1982),  and  thereby 
produces  substantially  different  kinds  of  decision  behavior.  In  a  Judgment 
task  the  subject  is  typically  required  to  assign  values  to  individual 
alternatives  as  an  expression  of  psychological  worth  (e.g.,  as  a  rating  on  a 
scale  or  as  a  representative  sum  of  money  he/she  would  pay  for  an  alternative) 
whereas  in  choice  the  task  is  to  select  one  or  more  preferred  ltem(s)  from  a 
set  of  alternatives.  For  example,  evaluating  the  overall  quality  of  a  make  and 
model  of  an  automobile  on  the  basis  of  information  such  as  size,  m.p.g. ,  cost 
constitutes  judgment,  whereas  selecting  a  car  for  purchase  from  among  those 
available  constitutes  choice.  Both  judgment  and  choice  are  clearly 
Interdependent  in  that  choosing  from  a  set  of  alternatives  may  well  entail 
judging  them  with  respect  to  several  dimensions;  nevertheless,  making  a  choice 
involves  explicit  consideration  of  utilities  (Edwards  A  Tversky,  1967),  a 
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dimension  that  la  not  neoessarily  involved  in  Judgment.  In  view  of  these 
considerations ,  both  types  of  response  were  examined  for  possible  display 
effects. 

METHOD 

Task.  The  basie  tasks  required  subjects  either  to  rate  the  suitability 
of  applicants  for  the  job  of  secretary  or  to  decide  whether  they  should  be 
hired  (tasks  that  most  subjects  find  both  meaningful  and  realistic).  More 
specifically,  subjects  were  presented  with  profiles  of  information  about 
hypothetical  applicants  which  were  comprised  of  four  dimensions:  intelligence, 
motivation,  skill,  and  experience.  Eaeh  profile  was  represented  in  one  of  two 
ways:  as  a  set  of  numerical  scores  (numerical  format)  or  as  a  set  of  bar 
graphs  (graphic  format). 

All  subjects  performed  the  rating  task  and  the  choice  task  under  both 
display  formats.  The  rating  task  simply  required  them  to  review  one  applicant 
profile  at  a  time  and  rate  it  on  a  suitability  scale  that  ranged  from  1 
(extremely  low)  to  10  (extremely  high).  Each  subject  rated  100  profiles  under 
each  format  condition.  In  the  choice  task,  two  applicant  profiles  were 
presented  together  and  subjects  had  to  indicate  which  one  of  the  two  applicants 
they  considered  more  suitable  for  the  Job.  Fifty  applicant  profile-pairs  were 
presented  for  choice  under  each  format  condition. 

Stimuli.  Four  sets  of  100  applicant  profiles  (designated  as  p,  q,  r, 
and  s)  were  produoed  by  a  multivariate  normal  generator  such  that  values  on  the 
four  cues  were  not  intercor related.  A  multivariate  array  of  deviates  in  the 
range  of  0  to  1  was  produced.  The  deviates  were  further  transformed  such  that 
the  aotual  values  that  defined  the  four  cues  (intelligence,  motivation,  skill, 
and  experience)  were  sampled  from  populations  with  means  (and  SDs)  of  25  (15), 
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5  (2),  10  (3),  and  3  (.5)  subject  to  the  constraint  that  the  cue  values  ranged 
between  1-50,  1-10,  1-25,  and  1-5  respectively. 

Sets  p  and  q  were  used  in  the  rating  task  and  sets  r  and  a  in  the  choice 
task.  The  profiles  in  each  set  were  printed  on  unlined,  continuous  paper;  only 
one  profile  with  an  applicant  number  appeared  on  each  page  in  the  rating  task 
(sets  p  and  q),  but  two  profiles  designated  as  applicant  A  and  applicant  B  were 
presented  on  the  same  page  in  the  choice  task.  Each  set  of  profiles  was 
represented  numerically  and  graphically  in  separate  booklets. 

An  illustration  of  the  two  formats  for  the  rating  task  is  shown  in  Figure 
1.  It  might  be  noted  that  the  four  cue  values  were  represented  as  raw  scores 
in  the  numerical  display  condition  and  as  standardised  scores  in  the  graphic 
condition.  That  is,  the  length  of  the  bar  for  each  graphic  cue  indicated  its 

Pigure  1  about  here 

value  on  scales  adjusted  to  have  comparable  physical  ranges  (see  asterisks 
indicating  the  upper  limit  on  each  scale).  This  apparent  confounding  was 
introduced  in  an  effort  to  equate  scales  in  terms  of  their  ability  to  convey 
the  cue  values  properly.  It  is  often  the  case  that  real  world  situations 
require  processing  of  numerical  cues  that  are  not  equated  on  scale  (e.g.,  GPA 
and  GRE  scores  in  evaluating  prospective  graduate  students)  and  presenting  them 
in  standardized  form  would  have  thus  destroyed  a  realistic  feature  of  the  task; 
on  the  other  hand,  presenting  the  raw  values  in  graphic  form  on  scales  with 
radically  different  ranges  would  have  been  confusing  from  a  perceptual 
standpoint.  Of  course,  the  question  of  which  definition  of  equivalence 


(litoral  or  perceptual)  is  more  appropriate  is  actually  an  empirical  one,  and 
one  that  was  addressed  in  Experiment  2  (below). 

Design.  A  simple  2x2  factorial  design  was  used  with  display  format 
(numerical  vs.  graphic)  and  type  of  decision  task  (Judgment  vs.  choice)  as 
wlthin-subjects  variables.  Both  rating  and  choice  tasks  under  a  particular 
format  were  performed  in  a  block  such  that  half  of  the  subjects  were  presented 
with  a  block  of  numerical  profiles  followed  by  a  graphic  block,  and  the  other 
half  performed  them  in  reverse  order.  The  order  in  which  the  rating  and  choice 
tasks  were  performed  within  each  of  these  blocks  was  also  counterbalanced. 

Half  of  the  subjects  were  presented  with  two  particular  sets  of  profiles  in 
numerical  form  (sets  p  and  r)  and  the  other  two  sets  in  graphic  form  (sets  q 
and  s)  whereas  the  reverse  assignment  was  used  for  the  remaining  subjects. 

These  counterbalancing  measures  required  eight  subjects  for  each  replication  of 
the  design. 

Subjects.  Forty-eight  subjects  were  recruited  from  undergraduate 
psychology  courses  at  Rice  University.  They  either  received  $4.00  or  course 
credit  in  exchange  for  their  participation.  An  equal  number  of  subjects  was 
assigned  randomly  to  one  of  the  eight  conditions. 

Procedure.  Up  to  six  subjects  participated  in  each  experimental  session 
which  lasted  about  an  hour.  Subjects  were  given  detailed  procedural 
instructions  with  special  attention  to  the  characteristics  of  the  cues  and  the 
way  they  were  represented  under  the  two  display  formats.  They  then  performed 
the  rating  and  choice  task  under  each  format  in  a  sequence  determined  by  the 
condition  to  which  they  were  assigned. 

Subjects  performed  the  rating  task  paced  by  a  "beeper"  tone  that  sounded 
at  8-second  intervals;  they  were  allowed  16  seconds  for  each  pair  of  profiles 


in  the  ehoice  task.  Both  the  rating  and  choice  responses  were  written  on 
separate  response  sheets. 

RESULTS  AND  DISCUSSION 

The  effects  of  format  were  evaluated  separately  for  the  rating  and  choice 
tasks  and  will  be  described  in  turn  in  the  following  sections. 

Rating  Task.  Of  the  100  profiles  rated  under  each  format,  10  at  either 
end  were  used  as  buffer  profiles,  and  responses  to  these  were  not  analyzed. 

The  buffer  profiles  at  the  beginning  were  included  to  familiarize  subjects  with 
the  task  and  to  allow  them  to  develop  a  consistent  rating  strategy;  those  at 
the  end  were  included  to  reduce  any  effects  of  inattentiveness  that  might  occur 
toward  the  end  of  a  session  (Lane  et  al.,  1982).  For  every  subject,  a  separate 
policy  equation  was  obtained  for  the  numerical  and  graphic  displays  by 
regressing  each  type  of  Judgment  on  the  four  cues. 

The  raw  score  regression  weights  obtained  for  the  four  cues  under  a 
particular  display  condition  serve  as  an  index  of  the  subjects*  weighting  of 
those  cues  for  that  display.  Thus,  one  way  of  determining  the  effect  of  format 
on  judgment  is  simply  to  compare  these  regression  weights  using  a  repeated 
measures  ANOVA.  However,  since  the  cues  were  presented  on  different  scales,  it 
was  considered  essential  to  make  the  raw  score  weights  comparable  by 
multiplying  them  by  the  SD  of  each  cue.  These  adjusted  weights  represent  the 
magnitude  of  change  in  the  criterion  produced  by  a  change  of  one  SD  unit  in  the 
predictor  (cue)  (Lane  et  al.,  1982).  The  mean  adjusted  weights  are  shown  in 
Table  2. 


Table  2  about  here 
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The  Min  effect  of  format  was  not  significant,  F(1,  47)  <1  suggesting 
that  the  average  weights  for  all  cu~«  combined  were  comparable  for  the  two 
formats.  Obviously  this  is  less  meaningful  than  the  cue  x  format  interaction 
(which  compares  weighting  policies  for  the  formats);  this  interaction  was 
highly  significant,  F(3,  141)  «  8.10,  £  <.0001.  The  main  effect  of  cues 
waa  also  significant,  F(3,  141)  *  90.84,  £  <.0001.  Clearly,  therefore, 
subjects  weighted  the  four  cues  differently-- intelligence  received  the  highest 
weight  followed  by  motivation,  skill,  and  experience.  And  although  the  average 
weight  for  the  cues  did  not  differ  across  formats,  the  specific  weights 
attached  to  each  cue  were  more  uniform  in  the  graphic  than  in  the  numerical 
display.  To  explore  the  nature  of  this  interaction  further,  individual  t-tests 
were  conducted  to  test  for  differences  in  the  weights  for  each  of  the  four 
cues.  The  weight  for  intelligence  was  reliably  smaller  for  the  graphic  than 
for  the  numerical  display,  t(47)  »  2.29,  P  <.05;  those  for  motivation  and 
experience  were  reliably  larger,  t(47)  =  2.60,  £<.02  and  4.90  £<.01 
respectively.  The  slight  increase  in  the  weight  for  skill  was  not 
significant,  £(47)  <1.  What  is  particularly  noteworthy  about  the  observed 
changes  is  that  the  fourth  cue,  experience,  which  had  virtually  no  impact  on 
Judgment  under  the  numerical  format  did  receive  some  weight  when  displayed 
graphically.  Coupled  with  this  increase,  the  decrease  in  the  highest  weight 
(for  intelligence)  produced  the  more  even  distribution  of  weights  under  the 
graphic  display  format. 

The  suggestion  that  a  graphic  format  enoourages  subjects  to  consider  and 
weight  all  cues  whereas  numerical  presentation  restricts  their  attention  to  a 
subset  of  cues  must,  of  course,  be  tempered  by  the  fact  that  format  and  scale 
representation  were  confounded  in  this  study  (see  METHOD).  It  will  be  recalled 


that  the  numerical  cues  were  presented  as  raw  scores,  whereas  the  graphic  cues 
were  presented  in  standardized  fora  on  physically  identical  scales.  Such 
rescaling  of  cues  directly  affected  their  variability.  Consequently  cues  with 
lower  variability  (viz.,  motivation,  skill,  and  experience)  may  have  appeared 
to  be  more  "scattered”  in  the  graphic  format  thereby  inflating  their  cue 
weights  relative  to  the  numerical  format.  This  possibility,  of  course, 
represents  an  alternative  explanation  for  the  results— particularly  with 
respect  to  the  finding  that  experience,  which  had  the  lowest  variance,  was 
weighted  substantially  more  heavily  under  the  graphic  than  the  numerical 
display— which  was  addressed  directly  in  Experiment  2. 

Besides  regression  weights  (which  index  subjects'  cue  utilization), 
another  useful  descriptive  measure  is  the  linear  consistency  of  individual 

policies  viz.,  the  squared  multiple  correlation  or  R2s  obtained  from  regressing 

2 

judgments  on  the  four  cues.  The  overall  difference  between  R  s  obtained  from 
the  numerical  and  graphic  policies  (0.64  vs.  0.69)  was  not  significant,  F(1, 

47)  *  2.69,  £  >.10.  However  it  should  be  noted  that  the  variances  in  R2s  can 
be  partitioned  as  follows: 

H?s  a  SSy  /  SSy  Equation  2 

but, 

SSy  s  SSy  ♦  SSe  Equation  3 

so 

R^s  s  SSy  /  (SSy  ♦  SSe)  Equation  4 

Thus  a  comparison  of  the  two  formats  in  terms  of  the  SSy  and  SSe 
components  was  deemed  more  meaningful  than  the  overall  R^s  index.  Since  SS 
measures  tend  to  have  skewed  distributions,  a  square-root  transformation  was 
applied  to  both  SSy  and  SSe  for  purposes  of  analysis.  Resulting  t-tests  showed 
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that  mean  J  SSy  values  were  not  significantly  different  under  the  two  display 
eonditions:  12.19  (nuaerical)  vs.  11.61  (graphic)*  t(47)  s  1.07*  £>.10. 

On  the  other  hand*  \jsse” differences  were  highly  significant:  8.90  (numerical) 
vs.  7.61  (graphic)*  t(47)  =  3.59*  £  <.001.  What  this  finding  suggests  is 
that  the  graphic  foraat  produced  considerably  aore  precision  in  judgment  than 
did  the  nuaerical  foraat*  a  conclusion  that  is  reinforced  by  the  fact  that 
variability  in  raw  criterion  judgments  was  also  significantly  lower  for  the 
graphic  display:  SD  *  1.58  vs.  1.74,  t(47)  =  3.63*  £<.001. 

The  analyses  discussed  so  far  assuae  that  subjects'  policies  could  be 

described  adequately  in  terms  of  a  linear  model.  Since  occasional  instances  of 
nonlinearity  have  been  reported  (e.g.,  Einhorn,  1970;  Einhorn,  1971;  Wiggins  & 
Hoffman,  1968),  a  quadratic  and  a  configural  model  were  also  applied  to 
subjects'  judgments.  Both  models  included  as  predictors  the  four  cues  (Xi)  and 

the  coded  format  vector.  In  addition*  the  quadratic  model  included  the  four 

o 

squared  values  of  cues  (Xi)  and  their  interactions  with  the  coded  vector  and 
the  configural  model  included  11  cross-products  of  cues  (XiXJ)  and  their 
interactions  with  the  coded  vector.  Results  of  these  analyses  indicated  some 
nonlinearity  for  a  few  subjects.  However*  even  those  who  showed  significant 
nonlinearity  could  be  described  adequately  in  terms  of  a  linear  model— the 

linear  model  alone  accounted  for  91.90<  and  90.801  of  the  total  variance 

2 

accounted  for  by  the  quadratic  and  configural  models  respectively. 

Choice  Task.  The  primary  question  of  interest  here  was  whether  choice 
performance  differed  significantly  with  format.  Since  there  was  no  external 
criterion  available  to  define  choice  accuraoy*  the  subjects'  own  numerical  and 
graphic  rating  policies  were  used  as  criteria.  That  is*  "policy  captured" 
weights  were  applied  to  the  cue  values  for  each  pair  of  choice  profiles  to 
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determine  which  profile  should  be  chosen  if  the  individual  were  consistent  with 
his/her  own  policy.  These  predicted  choices  were  then  compared  to  actual 
choices  under  the  two  formats  to  obtain  "accuracy"  measures.  Since  there  were 
two  polieies  (numerical  and  graphic)  for  each  set  of  values,  it  was  also 
possible  to  compare  decision  "accuracy"  for  consistent  criteria  (e.g.,  actual 
numerical  choices  evaluated  with  reference  to  a  numerical  policy)  with  those 
for  inconsistent  criteria  (e.g.,  actual  graphic  choices  evaluated  against  a 
numerical  policy).  These  "accuracy"  scores  were  analysed  in  a  2  x  2  ANOVA 
design  with  format  and  consistency  of  rating  policy  as  the  two  withln-subjects 
variables.  The  mean  accuracy  scores  are  reported  in  the  first  two  rows  of 
Table  3. 


Table  3  about  here 


Neither  the  effect  of  format  nor  the  interaction  between  format  x 
consistency  of  policy  was  significant,  F(1,  47)  »  .58  and  1.60,  £  a  .45  and 
.21  respectively.  This  suggests  that  despite  the  differences  in  subjects' 
rating  policies  under  the  two  formats,  they  predicted  choices  with  similar 
levels  of  accuracy.  There  was,  however,  a  significant  effect  of  consistency, 
F(1,  47)  a  9.43,  £<.01.  Although  the  absolute  differences  were  extremely 
aall,  a  consistent  policy  predicted  slightly  better  than  an  Inconsistent  one. 
This  implies  that  subjects'  rating  and  choice  behavior  were  more  similar  when 
information  was  displayed  in  identical  than  in  different  formats.  Thus  while 
numerical  and  graphic  cues  were  processed  differently,  the  same  display  mode 
Induced  similar  kinds  of  processing  for  both  rating  and  choice  tasks. 


Sino*  some  have  argued  that,  from  a  practical  standpoint,  judgment  is 
predicted  as  well  applying  unit  weights  to  the  cues  as  it  is  with  "policy 
captured"  or  derived  "importance”  weights  (Dawes  &  Corrigan,  1974;  Dawes,  1979; 
Einhora  &  Hogarth,  1975),  it  is  of  interest  to  compare  the  effioacy  of  the  two 
models  for  the  present  data.  Hence  predictions  from  a  unit-weighted  model  were 
obtained  separately  for  the  numerical  and  graphic  profiles  and  compared  to 
subjects'  actual  choices  under  those  formats.  The  resulting  accuracy  scores 
are  reported  in  the  third  row  of  Table  3.  Por  both  graphic  and  numerical 
formats,  the  consistent  rating  policy  predicted  better  than  a  unit-weighted 
model,  t(47)  *  5.06  and  2.66,  £  <.001  and  <.05  in  each  case.  However,  the 
inconsistent  policy  predicted  better  than  the  unit-weighted  model  only  for  the 
graphic  format,  t(47)  =  2.35,  £  <.05;  for  the  numerical  format,  the 
difference  was  not  reliable,  t(47)  »  1.95,  £  >.05.  The  finding  that  the 
consistent  policy  for  both  formats  fared  reliably  better  than  the  Inconsistent 
one  corroborates  the  conclusion  that  subjects  processed  stimulus  profiles 
similarly  (regardless  of  task)  under  a  particular  display.  It  also  implies 
that  the  regression  policy  did  capture  something  important  about  the  subject's 
behavior  under  a  particular  display  format. 

In  summary,  the  major  conclusion  to  be  drawn  from  this  study  is  that 
display  format  does  induce  differences  in  the  way  people  handle  predictive 
data,  although  as  one  might  suspect,  the  processes  Involved  are  not  necessarily 
simple. 

EXPERIMENT  2 

The  findings  of  Experiment  1  suggested  a  difference  in  pattern  of  cue 
weighting  for  numerical  and  graphic  formats.  More  specifically,  there  was  a 
tendency  for  the  graphic  format  to  produoe  a  more  even  weighting  of  cues  than 


the  numerical  format.  Whether  this  effect  was  the  result  of  the  display  format 
per  se  or  the  confounded  difference  in  representation  of  scale  values,  however, 
remained  unclear  (see  earlier  discussion).  Of  course,  scale  features  are 
themselves  an  aspect  of  display  formatting,  although  it  was  not  the  aspect  to 
which  Experiment  1  was  primarily  addressed. 

Therefore,  Experiment  2  sought  to  remove  the  oonfounding  of  scale  with 
display  format  effects.  The  design  was  similar  to  that  of  Experiment  1  except 
that  it  was  limited  to  the  rating  task,  and  all  cues  were  represented  on 
comparable  scales.  The  main  purpose  of  this  experiment,  then,  was  to  evaluate 
the  effect  of  format  on  the  Judgment  of  otherwise  strictly  equivalent 
Information. 

METHOD 

Materials  and  Design.  Subjects  performed  a  rating  task  similar  to  the 
one  described  in  Experiment  1.  They  reviewed  profiles  of  hypothetical 
applicants  for  the  Job  of  secretary  and  rated  them  on  a  suitability  scale  that 
ranged  from  1  (extremely  low)  to  10  (extremely  high).  Each  profile  contained 
information  about  the  applicant's  intelligence,  motivation,  social  skill,  and 
typing  ability. 

Tvo  seta  of  100  profiles  (p,  q)  were  generated  in  a  manner  identical  to 
that  in  Experiment  1  except  that  values  mi  all  d intensions— -Intel  1  igence , 

motivation,  skill,  and  typing  ability—* were  sampled  from  populations  with  a 
mean  of  25  and  SD  of  15.  The  scores  were  generated  randomly  subject  to  the 
constraint  that  they  ranged  between  1-50.  Both  profile  sets  were  represented 
numerically  and  graphically. 

The  format  in  which  the  information  was  displayed  was  varied  within 
subjects:  under  the  numerical  format  the  cue  values  were  presented  as 


numerical  scores,  whereas  under  the  graphic  format  they  were  presented  as 
horizontal  bar  graphs  (refer  to  Figure  1).  The  presentation  of  the  cues  under 
the  numerical  and  graphio  format  was  the  same  as  Experiment  1,  except  that  all 

cues  were  represented  on  eomparable  scales  and  thus  no  standardization  was 
necessary  for  the  graphio  display.  The  sequence  in  which  the  numerical  or 
graphic  information  was  displayed  was  counterbalanced  such  that  half  of  the 
subjects  rated  numerical  followed  by  graphio  profiles  and  the  other  half  rated 
them  in  the  reverse  order.  The  spool fio  sets  of  profiles  (p  and  q)  used  in  the 
numerical  and  graphic  conditions  were  rotated  so  that  they  were  represented  in 
both  formats.  Such  counterbalancing  resulted  in  four  different  conditions. 

Subjects.  Twenty  Rice  University  students  served  in  the  experiment  for 
course  credit  toward  undergraduate  psychology  courses  or  for  pay.  They 
participated  in  the  experimental  sessions  Individually  or  in  groups  of  3-6. 
Subjects  were  randomly  assigned  to  the  four  conditions  under  the  constraint 
that  an  equal  number  of  subjects  appeared  in  each  condition. 

Procedure.  After  initial  Instructions  regarding  the  task,  subjects 
received  individual  booklets  in  which  the  profiles  to  be  rated  were  printed. 

The  sequence  in  which  the  numerical  and  graphio  profiles  were  presented  and 
also  the  specifio  set  of  profiles  reviewed  was  determined  by  the  condition  to 
trtilch  the  subjeot  was  assigned.  Subjeots  reviewed  and  rated  each  of  the  100 
profiles  in  the  numerical  and  graphic  format  at  the  rate  of  8  seconds  per 
profile.  A  5  minute  rest  interval  was  interposed  between  the  rating  of  the  two 
sets. 

RESULTS  AND  DISCUSSION 

As  in  Experiment  1,  subjects  rated  100  profiles  under  each  format  and 
Judgments  for  10  buffer  profiles  at  either  end  were  not  analyzed.  Thus  every 
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subject's  policy  equation  for  the  two  formats  was  based  on  judgments  to  the 
remaining  80  profiles.  The  mean  raw  score  regression  weights  obtained  through 
policy  capturing  are  shown  in  Table  4.  An  ANOVA  applied  to  these  weights 
showed  a  marginal  effect  of  format,  F(1,  19)  =  3.68,  £  *  .07;  a  significant 
effect  of  cue,  F(3,  57)  *  4.33,  £  *  .008;  and  a  significant  cue  x  format 
interaction,  F(3,  57)  *  2.88,  £  *  .04. 


Table  4  about  here 


These  results  replicate  the  primary  finding  of  Experiment  1— format  again 
produced  a  differential  weighting  of  cues.  Thus  subjects  do  indeed  process  the 
same  cues  differently  under  the  numerical  and  graphic  display  and  the  obtained 
interaction  does  not  seem  to  be  dependent  specifically  on  the  soaling  features 
peculiar  to  those  in  Experiment  1.  In  order  to  describe  this  interaction 
precisely,  individual  t-values  were  obtained  by  comparing  the  mean  weights  for 
each  cue  under  the  two  formats.  Only  one  was  significant:  the  weight  for 
motivation  displayed  graphically  was  reliably  larger  than  that  displayed 
numerically,  t(19)  =  2.53,  £  <.05;  those  for  intelligence  ,  skill,  and 
typing  ability  all  failed  to  achieve  significance,  t(19)  *  1.05,  1.87,  and 
1.46,  £>.05  in  each  ease.  It  will  be  noted  that  there  were  a  number  of 
shifts  in  mean  weigting  of  cues  from  Experiment  1  (see  Tables  2  and  4). 
Numerical  cues  received  a  higher  average  weight  in  this  experiment,  a  fact  that 
could  be  attributed  to  an  ease  of  processing  of  cues  due  to  equated  scale 
units;  this  point  is  obviously  not  applicable  to  the  graphic  format  since  all 

3 

cues  were  presented  on  comparable  scales  in  both  experiments.  Consequently, 


these  shifts  in  mean  weights  obscure  any  tendency  for  graphic  cues  to  produce 
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■ore  even  weighting  then  numeric  ones.  We  can  thus  merely  note  that  in  this 
experiment,  the  weighting  of  individual  cues  differed  with  format,  but  in  no 
simply  described  pattern. 

Turning  to  the  consistency  of  judgments  (R^),  numerical  policies  were 


less  consistent  than  graphic  ones  (0.59  vs.  0.67) i  this  difference  was 
reliable,  F (1,  19)  s  8.13,  £<.01.  R^  was  broken  down  into  its  two 
components,  SSy  and  SSe,  and  a  separate  comparison  of  JsSy  and  Jsse  was  made 
for  the  two  formats.  The  mean  Jssf  for  the  numerical  and  graphic  Judgments 
were  12.12  and  13.72,  with  a  t-teat  on  this  difference  showing  t(19)  *  1.86, 

£  <.10;  JsSe  for  numerical  Judgments  was  larger  then  that  for  graphic  (11.21 
vs.  9.97),  t(19)  *  2.57,  £  <.02.  It  thus  appears  that  the  lower 
consistency  of  numerical  judgments  resulted  largely  from  greater  error  in  those 
judgments  than  in  the  graphic  ones.  The  finding  that  the  numerical  format 


produced  lesser  precision  than  the  graphic  one  parallels  that  of  Experiment  1; 
however,  the  lower  precision  did  not  affect  the  SDs  of  the  judgments  (1.90  vs. 
1.89  for  numerical  and  graphic  formats,  £(19)<  1).  *  summary  of  R^s  and  its 

component  measures  for  the  two  Experiments  is  provided  in  Table  5. 


Table  5  about  here 


Summarizing  the  first  two  experiments,  it  appears  that  format  does 
influence  the  manner  in  which  people  weight  oues,  but  the  nature  of  this 
influence  is  not  simply  described.  It  may,  in  fact,  be  quite  idiosyncratic. 
Nonetheless,  one  generalization  does  emerge t  judgment  is  less  consistent  under 
the  numerical  format,  and  this  is  attributable  chiefly  to  the  lower  precision 
of  numerical  Judgments  relative  to  graphic  ones. 


This  experiment  was  siaply  an  atteapt  to  elucidate  further  the  underlying 
nature  of  the  differences  produced  by  the  numerical  and  graphic  formats. 
Research  on  the  effect  of  structural  properties  of  stimuli  on  perceptual  tasks 
has  suggested  that  some  stimulus  dimensions  are  perceived  holistically 
(integral  dimensions) ,  while  others  are  perceived  Individually  (separable 
dimensions)  (Garner,  1974).  For  example,  the  height  and  width  of  a  rectangle 
are  combined  holistically  to  produce  perception  of  rectangular  area  (Felfoldy, 
1974;  Garner  &  Felfoldy,  1971;  Lockhead,  1979).  Later  investigations  have 
generalized  this  result  to  Include  decision  tasks,  more  particularly  in  the 
multiple  cue  probability  learning  paradigm  in  which  subjects  acquire  knowledge 

regarding  cue-criterion  relations  (Goldsmith  &  Schvaneveldt,  1982;  Wickens  & 
Scott,  1983). 

It  appears,  then,  that  the  graphic  format  might  encourage  a  holistic 
perception  of  cues  presented  together;  the  numerical  format,  however,  might 
produce  serial  processing.  It  ms  postulated  that  the  simultaneous 
presentation  of  cues  in  the  first  two  experiments  may  have  favored  a  holistic 
processing  of  graphic  cues.  If,  then,  cues  were  presented  sequentially  rather 
than  simultaneously,  the  holistic  perception  of  graphic  cues  would  be  largely 
eliminated  and,  as  a  oonsequence,  so  would  the  difference  between  the  numerical 
and  graphic  formats. 

The  present  experiment,  therefore,  involved  a  sequential  presentation  of 
cues  under  numerical  and  graphio  formats.  One  additional  manipulation  involved 
the  number  of  cues  presented  to  subjects.  Previous  investigators  (e.g. , 
Einhorn,  1971)  have  found  that  increasing  the  "information  load"  increases  the 
difficulty  of  integrating  cues  and  thus  is  detrimental  to  performance.  In  a 


sequential  presentation  of  eues,  subjeots  are  obliged  to  rely  heavily  on  memory 
in  Making  their  judgments  or  choices,  thus  exacerbating  the  difficulty.  By 
varying  the  number  of  eues,  therefore,  the  Interest  was  to  provide  an  adequate 
range  of  task  difficulty  for  the  appearenoe  of  any  potential  format  effects. 
METHOD 

Materials  and  Design.  As  in  Experiments  1  and  2,  subjects  rated 
multidimensional  stimuli  on  a  global  dimension.  However,  the  present  task 
involved  teaching  effectiveness  judgments  Instead  of  the  personnel 

selection/rating  tasks  used  previously.  The  primary  reason  for  this  change  was 
to  explore  the  generality  of  display  effects  in  another  realistic  judgment 
context,  while  preserving  the  formal  properties  of  the  task.  The  stimuli 
consisted  of  profiles  of  hypothetical  college  Instructors  whose  performance  was 
described  with  respect  to  either  four  or  six  cues,  and  values  of  the  cues  were 

displayed  either  numerically  or  graphically.  The  design,  then,  Involved  the 
factorial  combination  of  two  variables— number  of  cues  (four  or  six)  which  was 

manipulated  between-subjects  and  display  format  (numerical  vs.  graphic)  which 
was  manipulated  with in-sub Jeets . 

A  set  of  200  profiles  was  generated  in  a  manner  identical  to  that 
described  in  Experiment  1  except  that  (1)  six  cue  values  were  generated  per 
profile,  and  (2)  all  cues  were  sampled  frcm  populations  with  a  mean  of  30,  a  SD 
of  10,  and  a  range  of  1-60.  The  six  eues  describing  the  profiles  were 
designated  as  information  imparted  in  course,  arousal  of  interest. 
presentation  style,  knowledge  of  the  field,  rapport  with  students,  and 


clarity  of  course  requirements.  Subjects  in  the  four-cue  condition  were 
presented  with  a  subset  of  these  six  cues;  however,  the  exact  subset  of  cues 
was  sampled  independently  for  each  subject.  The  order  in  which  information 
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under  the  two  formats  was  displayed  was  counterbalanced  in  both  the  four-  and 
six-cue  conditions.  Thus  half  of  the  subjects  rated  numerical  profiles 
followed  by  graphic  profiles  and  the  reverse  was  true  for  the  remaining 
subjects. 

The  numerical  and  graphic  presentation  of  cues  was  similar  to  Experiments 
1  and  2.  As  in  Experiment  2,  all  cues  had  identical  scale  units  so  that  no 
transformation  of  the  cues  was  necessary  for  graphic  presentation.  For  every 
subject  100  profiles  were  chosen  randomly  for  graphic  presentation  and  the 
remaining  100  profiles  were  presented  numerically.  The  order  of  presentation 
of  the  cues  (for  the  four-  and  six-cue  conditions)  and  the  selection  of  the 
specific  four-cue  subsets  (for  the  four-cue  condition)  were  also  randomized 
individually.  Any  given  subject,  however,  reviewed  the  same  cues  in  a  specific 
order  under  both  types  of  display  format. 

Subjects.  The  experiment  was  conducted  in  individual  sessions  that 
lasted  for  about  an  hour.  Twenty  subjects,  enrolled  in  undergraduate 
psychology  courses  at  Rice  University,  participated  in  the  experiment  in 
exchange  for  course  credit  or  pay. 

Procedure.  The  subject  was  seated  in  a  cubicle  before  the  screen  of  a 
TRS-80  (Model  II)  microcomputer.  After  initial  instructions  regarding  the  task 
and  procedure,  the  profiles  of  instructors  were  displayed  on  the  screen  of  the 
computer,  one  profile  at  a  time.  Each  profile  was  presented  in  the  following 
manner.  Pirst  the  words  "Instructor  #"  appeared  on  the  screen  along  with  the 
number  of  the  profile  being  rated.  This  message  served  primarily  as  a 
preparatory  signal  for  subjects  to  attend  to  the  incoming  information  and  also 
to  distinguish  one  profile  from  another.  Then  the  cues  were  presented 
successively,  at  a  2  second  rate,  each  with  its  label  and  value.  After  all 


four  or  six  cues  were  presented,  the  instructions  "Rate  the  instructor  on  a 
scale  of  1  to  10"  were  displayed  on  an  otherwise  blank  screen,  and  the  subject 
proceeded  to  write  down  his/her  rating  on  a  separate  response  form.  After  the 
response  had  been  recorded,  the  experimenter  depressed  a  programmed  key  on  the 
computer  keyboard  to  initiate  the  next  profile.  Thus,  although  the  time  of 
presentation  of  cues  was  controlled,  subjects'  responses  were  essentially 
self-paced.  A  brief  rest  period  intervened  between  the  ratings  of  two  sets  of 
profiles. 

RESULTS  AMD  DISCUSSION 

The  principal  issue  addressed  in  this  experiment  was  the  effect  of 
sequential  presentation  of  cues  on  subjects'  judgments  under  numerical  and 
graphic  display  formats.  The  expectation  was  that  the  sequential  procedure 
would  eliminate  the  display  effect  if,  in  fact,  the  primary  causative  factor 
was  holistic  processing. 

Of  the  100  profiles  rated  under  each  format,  ratings  of  10  buffer  profiles 
at  either  end  were  not  analyzed.  Both  numerical  and  graphic  policies  for  each 
subject  were  then  obtained  by  regressing  the  80  judgments  on  four  or  six  cues. 
The  raw  score  regression  weights  from  the  numerical  and  graphic  policies  were 
then  analyzed  in  two  ways:  for  order  effects  and  for  specific  cue  effects. 

The  first  type  of  analysis  pertained  to  the  weights  attached  to  the 
sequential  position  of  successively  presented  cues.  Note  that  order  of 
presentation  of  specific  cues  (e.g.,  information  imparted  in  course  or 
presentation  style)  was  randomized  individually  so  that  the  effect  of  cue  in 
this  analysis  does  not  pertain  to  a  particular  cue  across  subjects.  Separate 


MWVAs  tier®  applied  to  th«  four-  and  six -cue  data  which  are  presented  in  Table  6. 

Table  6  about  here 

Looking  first  at  the  four-cue  ANOVA,  the  aeans  for  the  numerical  and 
graphic  formats  were  .64  and  .52  respectively,  a  difference  that  was 
significant,  F(1,  9)  =  11.01,  £<.01.  Thus,  cues  tended  to  be  weighted 
■ore  heavily  on  average  under  the  numerical  than  the  graphic  display. 

However,  as  predicted,  the  cue  x  format  interaction  did  not  approach 
significance,  F(3,  27)  <1.  Given  that  this  interaction  was  highly 
significant  under  a  simultaneous  presentation  of  cues  in  both  Experiments  1  and 
2,  the  failure  to  find  it  in  the  present  data  lends  indirect  support  to  the 
hypothesis  that  graphic  display  encourages  holistic  processing.  But  obviously, 
this  conclusion  must  be  considered  tentative  due  to  the  inherent  danger  in 
accepting  the  null  hypothesis.  The  effect  of  cue  was  only  marginally 
significant,  F(3,  27)  *  2.55,  £  *  .08. 

The  six-cue  ANOVA  also  failed  to  reveal  a  significant  cue  x  format 
interaction,  F(5,  45)  *  1.12,  £  »  .37,  thereby  supporting  the  claim  that  a 
sequential  presentation  eliminated  the  holistic  processing  of  graphic  cues. 
However,  the  main  effect  of  format  found  in  the  four-cue  condition  was  absent 
H®1*®*  F(1»  9)  <1.  Exactly  why  this  should  occur  is  not  clear.  There  was 
also  no  suggestion  of  the  presence  of  cue  effects,  F(5,  45)  *  1 .23*  £  * 

.31. 

The  analyses  dlsoussed  so  far  determined  whether  the  processing  of  cues 
was  affected  by  the  format  in  which  they  were  presented  and  their  temporal 
ordering.  The  second  analytic  approach  was  based  on  cues  irrespective  of 
order,  the  purpose  being  to  establish  whether  the  particular  cues  were 
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weighted  differently  under  the  two  formats.  This  analysis  was  possible  only 
under  the  six -cue  condition  since  the  four-cue  condition  did  not  provide  all 
subjects  with  the  same  subsets  of  cues.  Since,  the  effect  of  format  or  the  cue 
x  format  interaction  were  not  significant,  F(1,  9) <  1  and  F(5»  *5)  8  1«07* 

£  a  .39,  the  regression  weights  were  collapsed  across  format  and  these  means 
are  presented  in  Table  7.  As  is  apparent  from  Table  7,  there  was  clearly  an 
effect  of  cue,  F(5,  *5)  =  3.77,  £  *  .006.  As  one  would  expect,  some  cues 
were  weighted  more  heavily  than  others. 

Table  7  about  here 


In  sum,  there  was  no  evidence  for  a  differential  weighting  of  cues 
presented  sequentially  under  the  two  formats— the  cue  x  format  interaction 
found  consistently  in  the  first  two  experiments  was  eliminated  in  this  one. 
This  supports  our  hypothesis  of  holistic  processing  of  graphic  cues.  However, 
there  were  some  processing  differences  as  a  function  of  format;  the  numerical 
format  produced  larger  overall  cue  weights  than  the  graphic  format  in  the 
four-cue  condition. 

The  consistency  or  R^s  obtained  from  subjects'  policies  was  compared  in  a 
2x2  ANOVA  design,  with  number  of  cues  (four  vs.  six  cues)  as  a 

between-aubjects  variable  and  format  (numerical  vs.  graphic  format)  as  a 

2 

within-subjects  variable.  The  mean  R  s  for  the  four-  and  six  cue  conditions 
was  .67  and  .57  respectively  and  the  decline  in  consistency  as  the  number  of 
cues  increased  from  four  to  six  was  significant,  F(1,  18)  a  8.7*,  £  8  .008. 
However,  neither  the  effect  of  format  nor  the  number  of  cues  x  format 
interaction  was  significant,  F(1,  18)  *  1.90  and  1.62,  £  a  .19  and  .22. 


26 


9 


The  finding  that  an  increase  in  information  load  affects  R*s  is  supported  by 
previous  studies  (e.g. ,  Anderson,  1977}  Billings  A  Marcus,  1983}  Einhorn, 

1971).  However,  lowered  consistency  could  result  either  from  a  decrease  in  cue 
usage  due  to  the  greater  amount  of  processing  load  imposed  by  additional  cues 
(measured  by  SSy)  or  an  increase  in  random  error  (measured  by  SSe).  Looking  at 
these  components,  only  J  SSy  differed  for  the  four-  and  six-cue  conditions 
(11.12  vs.  8.92),  F(1,  18)  s  4.33,  |>  *  .05}  the  difference  between  J5§e  was 
not  reliable  (7.74  vs.  7.58),  F ( 1 , 1 8 ) <  1.  Format  did  not  affect  Jssy  or  JsSe 
significantly  and  the  number  of  cues  x  format  interaction  also  failed  to 
approach  significance  for  both  measures.  These  findings  suggest  that  subjects 
who  had  a  larger  set  of  cues  to  process  (six-cue  condition)  tended  to  use  the 
information  less  completely  than  those  who  had  a  smaller  set  (four-cue 
condition),  consequently  lowering  the  linear  consistency  of  their  policies. 
Whether  the  sequential  presentation  of  cues  imposed  an  additional  memory  load 
and  caused  a  greater  decrement  in  consistency  for  the  former  condition  relative 
to  a  siMltaneous  presentation  is  not  possible  to  determine  from  these  data. 

0ENE1UL  DISCUSSION 

TWo  common  formats  for  displaying  "cue  information"  were  compared  with 
respect  to  their  influence  on  judgment  and  choice  behavior  under  different  task 
scenarios.  The  most  important  finding  was  that  subjects  weighted  the  same  cues 
differently  when  displayed  numerically  than  they  did  when  displayed  in  graphic 
form.  That  is,  their  judgments  and  ohoices  suggested  that  they  attached 
consistently  more  (or  less)  importance  to  particular  items  of  information  under 
one  format  than  under  the  other,  and  they  did  so  irrespective  of  the  task 
scenario  used  (e.g.,  whether  the  Judgment  involved  the  suitability  of  job 
candidates  or  the  evaluation  of  instructors). 


These  differences  disappeared,  however,  under  conditions  of  sequential 
cue  presentation  (Experiment  3),  a  situation  designed  to  minimize  the  holistic 
processing  tendency  believed  to  occur  with  the  graphic  format.  Thus  a 
necessary  condition  for  the  demonstration  of  format-induced  differences  is  the 
simultaneous  availability  of  cue  values.  Presumably,  people  tended  to  process 
numerical  information  serially  in  any  case,  while  they  may  operate  in  a  more 
holistic  (simultaneous  processing)  mode  if  multiple  graphic  inputs  are 
available  simultaneously.  Additional  evidence  for  the  holistic  processing 
hypothesis  was  obtained  in  Experiment  1,  where  the  graphic  display  produced 
more  uniform  cue  weightings  than  did  the  numerical  display.  However,  the 
Experiment  2  results  were  equivooal  with  respect  to  this  tendency,  so  the 
question  of  whether  a  graphic  format  encourages  the  operator  to  pay  more 
attention  to  more  of  the  available  cues  is  still  an  open  one.  Although 
holistic  processing  would  seem  logically  to  encourage  more  complete  use  of 
predictive  information,  it  does  not  follow  that  the  resulting  weights  must  be 
more  uniform  than  for  serial  processing. 

The  present  evidenoe  for  display-induced  differences  in  cue  weighting  is 
contrary  to  several  previous  reports  (e.g.,  Anderson,  1977;  Goldsmith  & 
Schvaneveldt,  1982;  Knox  A  Hoffman,  1962;  Wlckens  &  Scott,  1983).  The 
difference  may  well  reflect  an  important  methodological  point  regarding  the 
calculation  of  regression  weights.  As  noted  earlier,  the  conventional  method 
uses  standardized  weights  that  may  be  insensitive  to  actual  changes  in  cue 
weighting.  Consequently,  a  more  powerful  alternative  measure— the  raw  score 
regression  weight— was  used  in  the  present  analyses.  The  fact  that  it  revealed 
significant  differences  whereas  the  previous  research  had  not  lends  credence  to 
the  argument  that  the  raw  score  weight  is  more  sensitive  and  hence  more 
appropriate  than  standardized  weights  for  measuring  oue  importance  (Lane  et 


Oim  other  Methodological  point  that  has  been  virtually  Ignored  in  previous 

2  2 

research  concerns  R  s  or  the  consistency  of  subjects'  policies.  R  s  refers  to 

the  proportion  of  variance  in  actual  judgments  that  is  accounted  for  by  the 

variance  in  predicted  Judgaents  (that  are  based  on  a  weighted  combination  of 

2 

cues).  Typically,  studies  have  reported  either  R  s  alone  (e.g. ,  Einhom,  1971; 

2 

Knox  A  Hoffman,  1962)  or  R  s  plus  the  variance  of  criterion  judgments  without 
explicating  their  relationship  (e.g.,  Anderson,  1977).  As  was  illustrated  by 
the  present  research,  important  information  concerning  the  effect  of 
experimental  manipulations  oan  be  revealed  by  examining  the  components  of 
R  s— the  variance  of  actual  judgaents  (or,  alternatively,  SSy),  the  variance  of 
predicted  judgments  (or,  SSy),  and  variance  of  error  (or,  SSe).  Thus  in 
Experiment  2,  for  example,  the  lower  consistency  of  numerical  vs.  graphic 
policies  was  shown  to  stem  largely  from  the  higher  magnitude  of  error  (SSe)  in 
those  judgments.  Since  the  components  have  Independent  meaning,  it  is  obvious 
that  more  can  be  learned  about  the  underlying  processes  by  analyzing  SSy  and 
SSe  separately  than  by  merely  reporting  overall  consistency  (R  s). 

In  sum,  the  present  research  serves  to  demonstrate  that  format  can  affect 
judgment  and  choice  behavior,  although  the  precise  nature  of  the  processing 
difference  was  not  established.  While  the  results  were  generally  consistent 
with  a  holistic-serial  processing  distinction,  they  did  not  prove  the  point. 

The  fact  that  the  present  findings  failed  to  confirm  null  format  results 
reported  elsewhere  is  attributed  to  use  of  insensitive  "importance  weighting" 
measures  in  that  research.  This,  as  well  as  several  other  methodological 
refinements  for  studying  judgment  performance  were  developed  and  illustrated  in 
the  three  reported  experiments. 


FOOTNOTES 


1.  Traditionally  the  multiple  regression  model  in  policy  capturing  has 
been  used  for  the  primary  purposes  of  identifying  the  underlying  judgment 
"process"  and/or  predicting  judgment  "outcomes".  As  has  been  argued  elsewhere 
(see  Kerkar,  1983)*  the  usefulness  of  the  paradigm  can  be  enhanced  considerably 
if  it  is  applied  in  a  functional  manner.  Very  simply,  within  a  functional 
framework  the  regression  model  is  used  to  index  performance  in  decision  tasks 
with  varying  demands:  the  goal  is  to  relate  task  features  to  behavioral 
consequences  without  undue  emphasis  on  modeling  processes  or  capturing 
outcomes.  The  regression  model  has  been  used  within  such  a  functional 
framework  in  the  experiments  reported  here. 

2.  There  was  no  evidence  that  nonlinearity  varied  systematically  with 
display  format.  Since  a  similar  pattern  of  data  was  observed  for  Experiments  2 
and  3,  a  discussion  of  these  results  is  omitted. 

3.  This  observation  overrides  any  differences  in  cue  weighting  that  might 
arise  from  changes  in  cue  labels  from  Experiment  1  to  Experiment  2. 
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TABLE  1 

Descriptive  measures  obtained  from  policy  capturing 


bi  s  raw  score  regression  weight  for  cue  'i'  obtained  from  the 

subject's  policy,  indicating  the  importance  attached  to  that  cue 

R^s  s  linear  consistency  of  the  subject's  policy 

SS?  s  sum  of  squared  deviations  of  the  subject's  predicted  responses 
($s) from  their  mean 

SSe  s  sum  of  squared  deviations  of  the  residual  in  the  subject’s 
responses  that  could  not  be  predicted  from  a  weighted 
combination  of  the  cues  (Ys  -  $s)  from  their  mean 

SSy  s  sum  of  squared  deviations  of  the  subject's  responses  (Ys)  from 
their  mean  or  (SSy  ♦  SSe) 


J*. 


Table  2 


Heart  raw  score  regression  weights  (adjusted)  for  the  four  cues  under 
the  numerical  and  graphic  formats  (Experiment  1). 


Cues 

Intelligence 

Motivation 

Skill 

Experience 

Numerical 

Format 

1.12 

.57 

.37 

.03 

Graphic 

Format 

.97 

.70 

.33 

.15 

Note:  Adjusted  regression  weights  were  obtained  by  multiplying  the  raw 
score  weights  and  standard  deviations  of  cue  values  to  equate 
scale  differences  among  the  four  cues. 
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Table  3 


Choice  Accuracy  based  on  a  comparison  of  subjects'  actual 
choices  under  the  two  formats  and  choices  predicted 
from  their  regression  models  and  a  unit-weighted  model. 


Graphic  Format  Numerical  Format 


Consistent  rating 
policy 

83.96 

83.54 

Inconsistent  rating 
policy 

80.63 

82.86 

Unit-weighted 

model 


77.55 


80.42 


Table  6 


Mean  raw  score  regression  weights  for  the  successively  presented  cues. 

! 

-  - - - - - 


Number  of  Format 
Cues 


Numerical 

Four 

Graphic 

«•» 

X 


Numerical 

.35 

.37 

.29 

.30 

.59 

.36 

Graphic 

.29 

.92 

.28 

.29 

.96 

.38 

X 

.32 

.90 

.26 

.30 

.50 

.37 

Note:  The  means  in  the  Table  are  adjusted  by  the  SDs  of  the  cues  to 
make  the  data  consistent  with  those  from  Experiments  1  and  2. 


Order  of  Presentation 


1 

2 

3 

9 

5 

6 

.55 

.51 

.66 

.85 

— 

— 

.38 

.95 

.57 

.67 

— 

— 

.96 

.98 

.61 

.76 

I 
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Table  7 


Mean  raw  score  regression  weights  (collapsed  across  fornat) 
for  the  specific  cues  in  the  six-cue  condition. 


Information 

Arousal 

Presenta¬ 

Knowledge 

Rapport 

Clarity  of 

imparted 

of 

tion 

of 

with 

require¬ 

in  course 

interest 

style 

field 

students 

ments 

.47 


56 


35 


32 


25 


19 


Numerical  Display 


Intelligence  35 
Motivation  6 
Skill  16 
Experience  3 


Graphic  Display 


MOTIV 

SKILL 

EXP 


FIGURE  1.  Illustration  of  the  numerical  and  graphic  formats. 
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